Multiset Canonical Correlations Analysis of Bidimensional Intrinsic Mode Functions for Automatic Target Recognition of SAR Images

A novel feature generation algorithm for the synthetic aperture radar image is designed in this study for automatic target recognition. As an adaptive 2D signal processing technique, bidimensional empirical mode decomposition is employed to generate multiscale bidimensional intrinsic mode functions from the original synthetic aperture radar images, which could better capture the broad spectral information and details of the target. And, the combination of the original image and decomposed bidimensional intrinsic mode functions could promisingly provide more discriminative information for correct target recognition. To reduce the high dimension of the original image as well as bidimensional intrinsic mode functions, multiset canonical correlations analysis is adopted to fuse them as a unified feature vector. The resultant feature vector highly reduces the high dimension while containing the inner correlations between the original image and decomposed bidimensional intrinsic mode functions, which could help improve the classification accuracy and efficiency. In the classification stage, the support vector machine is taken as the basic classifier to determine the target label of the test sample. In the experiments, the 10-class targets in the moving and stationary target acquisition and recognition dataset are classified to investigate the performance of the proposed method. Several operating conditions and reference methods are setup for comprehensive evaluation.


Introduction
With the fast progress in SAR technologies, the massive radar measurements can hardly be interpreted by mere human intervention. In this context, the ATR system is developed, which comprises an integrated gallery of algorithms handling different tasks [1][2][3]. In the SAIP system [2], the famous three-stage processors were designed for SAR ATR, i.e., detection, discrimination, and classification. e detector locates the ROI by searching through a large-scene image, which may cover several square kilometers of ground. Afterwards, the discriminator operates the preliminary classification to distinguish the man-made objects and natural clutters. And, those ROIs assumed to be natural clutters are discarded. e classification procedure is performed finally to obtain target labels. As the core index of a SAR ATR system, the recognition performance is directly related to the used classification scheme. In the past decades, a rich collection of available feature extraction algorithms and classifiers were used for SAR ATR owing to the progress in modern pattern recognition techniques. e features are first extracted from the original SAR images before the classification, which are used to describe the target characteristics. e hand-crafted features for SAR ATR are obtained via inheriting the optical image processing techniques as well as considering the unique properties of SAR images, such as geometrical, transformation, and scattering features. Geometrical features are classical ones, which have been adopted in optical image processing for a long time [4][5][6][7][8][9]. Also, they are used to describe the geometric properties of SAR targets such as sizes and shapes. In [4], the descriptors of target outline, i.e., EFS, were extracted for classification. Amoon and Clemente generated the moments' features, i.e., Zernike [5] and Krawtchouk [6] moments, respectively, from the binary target regions segmented from SAR images, which were afterwards classified for target recognition. Ding et al. directly matched two binary target regions from the test sample and corresponding templates for SAR target recognition [7]. e transformation features are constructed via mathematical projection or signal processing to depict the intensity distribution or spectral properties of the original images [10][11][12][13][14][15][16][17].
e manifold learning algorithms are applied to SAR image feature extraction including linear and nonlinear ones. Typical examples of linear methods are LDA [10], PCA [10,11], and NMF [12]. To handle the possible nonlinearity embedded in the data, some nonlinear manifold algorithms are also developed for feature extraction [13,14]. Several signal processing techniques are extended to image processing such as wavelet analysis [15] and monogenic signal [16,17]. In [16,17], the researchers introduced monogenic signal analysis to feature extraction of SAR images, which is demonstrated highly effective for SAR ATR. Unlike the optical imaging mechanism, SAR images reflect the electromagnetic scatterings of the target [18]. In this sense, the scattering features, e.g., attributed scattering centers, are also sufficiently descriptive to distinguish different kinds of targets, which provide locally relevant descriptions of the target structures. e effectiveness of attributed scattering centers was experimentally investigated in some previous works [19][20][21]. e classifiers were also greatly enriched over the past decades. e SAIP system employed the template matching as the baseline classifier. In [22], Zhao and Principe introduced SVM into SAR ATR, which became one the most popular classifier in this field [4,5,23,24]. e development in compressive sensing theory produced a robust classifier called SRC [25][26][27][28], which was used by iagarajan et al. to handle SAR ATR issues. Other classifiers such as AdaBoost [29], discriminative graphic model [30], modified polar mapping classifier [31], and HMM [9] were also investigated in the field of SAR ATR. As the deep learning methodology is getting mature, the deep classifiers for image interpretation, e.g., CNN [32][33][34][35][36][37], were validated highly effective for SAR ATR. However, the performance of CNN is highly dependable on the amount of available training samples. As a remedy, some CNN-based methods sought performance enhancement via proper data augmentations [34,35].
In this paper, we propose a new way of generating discriminative features for SAR ATR via BEMD [38] and MCCA [39]. EMD adaptively decomposes 1D nonstationary signals, which was proposed by Huang et al. [40]. With no prior assumptions on the data properties, e.g., linearity or stationarity, EMD could keep its effectiveness and robustness under different conditions. As a natural extension, BEMD is capable of analyzing 2D signals, e.g., images, to learn more details. Via the sifting process, the generated BIMFs could provide complementary information for the original image, thus beneficial for image interpretation such as image denoising and image fusion [41][42][43][44][45][46]. Owing to these merits, we introduce BEMD to SAR feature extraction. In this way, much broader spectral properties of the original SAR images can be captured by the BIMFs for the following classification tasks. e BIMFs are also images with the same sizes of the original one. As a result, they significantly increase the computational burden of classification. As a feasible solution, some feature extraction algorithms could be used to reduce their dimensions independently, such as the down-sampling strategy for the multiscale monogenic components used in [16]. However, such strategies neglect the inner correlations between the original image and BIMFs, which are also beneficial to distinguish different classes. As a remedy, this study employs MCCA to fuse the original SAR image and decomposed BIMFs as a unified feature vector. CCA provides a statistical way to analyze the relationship between two random variables and find the best projection matrices to keep their correlations [47]. MCCA extends CCA to multiple random variables [48][49][50][51][52]. e resulted feature vector could significantly reduce the high dimension of the original image and BIMFs, while maintaining their inner correlations; thus, both the efficiency and effectiveness of the following classification can be promisingly enhanced. To perform the classification task, SVM is adopted as the classifier. SVM is one of the most popular classifiers used in SAR ATR. In the previous literature, SVM was employed to classify various kinds of features, e.g., target outline descriptors, region moments, and PCA feature vectors, with good performance. erefore, we use SVM to classify the fused feature vector via MCCA to determine the target label of the test sample. e main contribution of this study is that a novel feature extraction method is proposed via the combination of BEMD and MCCA. e resulted features can maintain the discrimination in the original image and its BIMFs with a significantly low dimension. erefore, the highly discriminative features can effectively improve the classification performance. e remaining sections of this paper are organized as follows. Section 2 introduces the main methodology of the proposed method including BEMD, MCCA, and SVM. In Section 3, experimental investigations are conducted on the MSTAR dataset to evaluate the performance of the proposed method. Section 4 discusses the experimental results, and Section 5 draws some conclusions to summarize this paper. e acronyms used throughout the whole paper are summarized in Table 1.

Methodologies of the Proposed Method
2.1. BEMD. Different from stationary signals, nonstationary ones vary along with time thus much difficult to be reliably analyzed. Proposed by Huang et al. [41], EMD provides an adaptive way to decompose 1D signals. Unlike traditional signal decomposition algorithms, e.g., Fourier transform and wavelet analysis, EMD does not design predetermined basis functions but adaptively conducts the decomposition according to the properties of the data.
e IMFs are decomposed from EMD, which could be used to better analyze the time-frequency properties of the original image. Owing to its adaptivity and stability, EMD has been successfully applied to process different kinds of signals including biological, medical, and astronomy ones.
Given an original signal as f(t), the decomposition process of EMD (often called "sifting") is formulated as follows: where J k (t), k � 1, 2, . . . , K, represents the IMFs and r K denotes the residue.
To handle 2D signals such as images, Nunes et al. generalized the original EMD to BEMD. Similarly, via BEMD, an image is decomposed to several BIMFs to provide more detailed descriptions for it. According to Nunes et al. [38], for a given image I(i, j) with the sizes of M × N, the sifting process of BEMD is summarized as the following steps: Step 1. Identify the locations of the local extrema (including the maxima and minima) in I(i, j).
Step 2. Generate the envelopes according to the maxima and minima point sets via 2D interpolation, respectively. Afterwards, the local mean m is computed as the average of the upper (from the maxima points) and lower (from the minima points) envelopes.
Step 3. Subtract the local mean from the original image to get a proto-BIMF as r � I − m. If r is judged to be a BIMF, go to Step 4. On the contrary, it is used as the input to repeat Steps 1 and 2 until the latest proto-BIMF becomes a BIMF. EMD sifting was iterated based on the Cauchy standard deviation criterion designed by Huang et al. For the case of BEMD, the criterion is updated as follows: where r k (i, j) is the output in the kth iteration and Based on the calculated SD value, the sifting process is judged to continue or stop. When the SD is larger than a predefined threshold ε, the sifting process continues by repeating Steps 1 to 3 with r k (i, j) as the input. On the contrary, r k (i, j) is judged to be a BIMF d k (i, j), which is output. According to the analysis in [35], we set ε � 0.12 in this paper as a suitable choice of the threshold.
Step 4. Take the proto-BIMF r as the input and repeat Steps 1 to 3 to obtain the next BIMF until the process can be looped further.
After the sifting process, the original image can be represented as the combination of the BIMFs as follows: In equation (3), d k (i, j) denotes the kth BIMF and r K (i, j) represents the final residual. Among all the BIMFs, d 1 describes the highest frequency of I and r K (i, j) represents the lowest frequential component. erefore, the multiscale BIMFs could provide more comprehensive descriptions of the spectral properties of the original image. In addition, some details can be better embodied in the BIMFs, which cannot be remarkably reflected in the original image. erefore, by proper use of the BIMFs, more information of the original image can be exploited for the interpretation tasks such as image denoising, image fusion, and image classification. e advantages of BEMD inspires us to apply it to feature extraction of the SAR image for target recognition. Figure 1 intuitively displays the effectiveness of BEMD on the SAR target image chips from the MSTAR dataset, in which the first three BIMFs are shown. It can be observed that some details (e.g., the dominant scattering centers) in the original images can be better reflected in the first two BIMFs. In the third BIMF, the target related descriptions become blurry. As a result, it provides very limited discrimination for target recognition. Accordingly, only the first two BIMFs are used for target recognition in the following.

2.2.
MCCA. CCA provides a statistical way to identify the association between two sets of random variables. As a generalization of CCA, MCCA is capable of analyzing the relationships among more sets of variables [48][49][50][51][52]. Assume that there are n random vectors X i ∈ R pi (i � 1, 2, . . . , n) and each of them is centralized to have E(X i ) � 0, in which pi corresponds to the dimension of X i . MCCA aims to find the linear combinations given by with the dispersion matrix as follows: where S ij represents the covariance matrix between X i (i � 1, 2, . . . , n) and X j (j � 1, 2, . . . , n) and S ii denotes the covariance matrix of vector X i , and the vector α i ∈ R pi . MCCA searches for the projection vectors α 1 , α 2 , . . . , α n , which maximize the correlations between the canonical variables U 1 , U 2 , . . . , U n . A measure to evaluated their correlations can be formulated in terms of the covariance matrices. en, the measure of Σ U can be optimized by imposing special criteria with some constraints. One of the well-known criteria called "SUMCOR" is exhibited in the following equation: To solve the problem in equation (6), the Lagrange multiplier technique can be used. It can be reformulated as equation (7) as solving the generalized eigenvalue problem: e optimal projections vectors for X i are calculated as the conjugate eigenvectors α i1 , α i2 , . . . , α id corresponding to the first d � min(p1, p2, . . . , pn) eigenvalues λ i1 ≥ λ i2 ≥ · · · ≥ λ id in equation (7). Afterwards, the multiset canonical correlation vectors can be obtained as the following equation: where represents the projection matrix for each set of the random variables. As discussed above, MCCA is capable of exploiting the within-set and between-set correlations among multiple vectors. erefore, it can be used to fuse multiple random variables to reduce the redundancy, while maintaining the correlations. In this work, the serial fusion strategy (Peng et al. [48]) is adopted as the following equation: where Z denotes the fused feature vector.

Target Recognition via
In equation (10), x i (i � 1, . . . , M) denotes a support vector from the training samples and y i � ±1 is its corresponding label. w i (i � 1, . . . , M) and b (bias) are the parameters estimated during the training. K(·) represents the kernel function. With different choices of kernel functions, the trained SVM can handle different kinds of classification problems including linear and nonlinear ones. e polynomial kernel and RBF kernel are two typical kernel functions in SVM.
Researchers generalized the two-class SVM to multiclass one via the one-versus-one or one-versus-rest strategies. In this way, SVM can be directly used to classify many types of targets simultaneously. e famous LIBSVM [53] is an excellent toolbox to employ SVM for different usages, which is also used in this work. e multiclass SVM with the RBF kernel is adopted to perform the classification tasks based on the generated features. e novel feature vectors generated by BEMD and MCCA are classified by SVM, as shown in Figure 2. Considering that the original image and BIMFs are all 2D matrices, they are reshaped as 1D vectors. Afterwards, MCCA is employed to fuse them as a unified feature vector. In detail, the following steps are implemented to perform the target recognition task.
Step 1: BEMD is used to extract the multiscale BIMFs from the training samples Step 2: the first two BIMFs and original image are taken as random variables to calculate the projection matrices using MCCA Step 3: each of the training samples and its corresponding BIMFs are fused based on the projection matrices from Step 2 to build a new training set Step 4: the fused feature vector of the test sample is obtained using BEMD and MCCA in the same way with the training samples Step 5: the feature vector of the test sample is classified by SVM to determine the target label

Dataset and Methods for Comparison.
e MSTAR dataset is employed for experimental evaluation in this study, which is widely used to develop and test SAR ATR algorithms.
ere are 10 representative ground targets contained in the dataset, whose optical appearances are displayed as Figure 3. SAR images of these targets are acquired by X-band sensors with the resolution of 0.3 m × 0.3 m. e aspect angles of each target cover full 0°∼360°at different depression angles, e.g., 15°, 17°, 30°, and 45°. In addition, some targets (e.g., BMP2 and T72) have several different configurations with some structural modifications.
As a necessary part of validating the performance of the proposed method, some baseline algorithms, which are widely used in SAR ATR, are compared. eir implementation details are itemed as follows.

Results under SOC.
e proposed method is first investigated under SOC, which can be regarded as a preliminary verification. Table 2 shows the training and test samples from the 10 classes under SOC. Images at 17°depression angle are trained for the classification of 15°-depression-angle test samples. Specifically, the test images of BMP2 and T72 contain two more configurations (i.e., SNs) than their training sets, respectively. e confusion matrix is used to display the classification results by the proposed method as Figure 4, in which the X and Y coordinates represent the predicted and actual labels, respectively. It shows that each class can be classified with a recognition rate over 97.5%. And, the overall recognition rate of the 10 Computational Intelligence and Neuroscience targets is averaged to be 99.03%. Because of the existing configuration differences between the training and test sets, BMP2 and T72 get the lowest two recognition rates among all the targets. According to the reported results, the high performance of the proposed method under SOC is quantitively validated. e BIMFs generated by BEMD are discriminative features, which could maintain the original target characteristics. Furthermore, MCCA combines the original image and its multiscale BIMFs as a unified feature vector, while keeping the inner discrimination. Finally, as a high-performance classification scheme, SVM makes decisions on the target labels based on the fused features. All these factors contribute to the excellent performance of the proposed method under SOC.

Recognition under EOC.
EOCs are common situations in SAR ATR. As illustrated in Table 2, the same target may have different configurations. Moreover, due to the variations of backgrounds and sensors, other EOCs such as depression angle variance and noise corruption are also severe obstacles to the smooth implementation of SAR ATR systems. Consequently, in this part, three typical EOCs are setup to test the robustness of the proposed method. e first EOC is configuration variance and the training and test samples are displayed in Table 3. For BMP2 and T72, their test samples are from totally different SNs with the training ones. In addition, BDRM2 and BTR70 are used as two confuser targets in the training set, which further increases the difficulty of correct    Table 4 including SAR images of four targets (2S1, BDRM2, ZSU23/4, and T72 (SN_A64)) from different depression angles. Samples at 17°depression angle are trained for the classification of test samples at 30°and 45°depression angles, respectively. e relatively large depression angle variances between the training and test samples decrease their similarities in SAR images [55], as shown in Figure 5. e third EOC is noted as "noise corruption." According to ideas in [16,32], the noisy samples are obtained by adding random noises into the original test images in Table 2. In detail, a certain percentage of pixels in the original SAR images are replaced by impulses, i.e., pixels with large intensities. Figure 6 illustrates some noisy samples at different noise levels.
Based on the aforementioned EOC experimental setups, the robustness of the proposed method is tested. Table 5 displays the classification results of the proposed method under EOC-1. e test configurations of BMP2 and T72 can all be classified with recognition rates over 96%. And, the overall recognition rate is 98.08%. As shown in Figure 1, the multiscale BIMFs can better reflect the detailed information in the original SAR image.  Table 6. e average recognition rates at 30°and 45°depression angles achieved by the proposed method are 98.18% and 73.43%, respectively. e great decrease in recognition rate at 45°depression angle is probably caused by the low similarities between the training and test samples, which can be observed in Figure 5. Table 7 lists the average recognition rates at different noise levels, which shows the sensitivity of the recognition performance to random noise corruption. At 20% noise level, the overall recognition rate jumps to 62.12%, which is significantly lower than that on the original test samples. As   a summary, the proposed method is investigated under both SOC and three typical EOCs. Compared with EOCs, the performance under SOC is much better because the test samples are much more similar with the training ones. In each EOC, the classification accuracy is closely related to the deterioration degree, which can be directly seen from the results in EOC-2 and EOC-3.

Performance Comparison with Baseline Algorithms.
In this part, the proposed method is evaluated against the baseline algorithms under different operating conditions. Table 8 compares the performance of all the methods under SOC and EOC-1. e proposed method outperforms the others under both situations. In contrast to other three SVM-based methods, the proposed method achieves 2.11%, 2.89%, and 2.61% increments in the recognition rate over PCA, Zernike, and EFS features under SOC. And, the increments change to 3.21%, 2.94%, and 2.88% under EOC-1. e better performance achieved by the proposed method verifies the superior effectiveness of the features used in the classification stage. erefore, the generated features via combination of BIMFs and MCCA are more discriminative than the projection features extracted by PCA, region features (e.g., Zernike moments), and target outlines (e.g., EFS descriptors). Figure 7 simultaneously compares the recognition rates of all the methods at different depression angles. With a similar trend occurred in the proposed method, the baseline algorithms all decrease sharply when the depression angle switches from 30°t o 45°. At each depression angle, the proposed method gains the highest accuracy mainly because the generated features can better reflect the local variations caused by depression angle variance. Especially at 45°depression angle, the predominance of our approach becomes much more obvious and the least increment in the recognition rate is over 3.41% in comparison with the baseline algorithms. e performance of different methods under random noise corruptions is shown in Figure 8. Although decreasing with the aggravation of the noise level, the recognition rates of the proposed method keep higher than those of the baseline algorithms. So, the generated features are more robust than the other features according to the experimental results. e Zernike and EFS features perform relatively better among the baseline algorithms because the two types of features are extracted based on the binary target region, which keeps more robust than the intensity distributions under random noises. For the CNN-based methods, they are relatively more vulnerable to random noise corruption than the remaining ones. According to the deep learning classification scheme, the performance is highly attributed to the coverage of the training set. Under high levels of random noises, the operating conditions of the test samples cannot be covered by the original training samples. As a result, the classification accuracy decreases sharply. According to the results of performance comparison, the superiority of the proposed method under both SOC and EOCs is validated. And, the main reasonability of the good performance lies in the high discriminability of the generated features via the combination of BIMFs and MCCA.

Discussions
Some discussions are made in this section to further explain the experimental results on the MSTAR dataset, which quantitively verified the superior performance of the proposed method. e reasonability and feasibility of the results can be discussed from following aspects.
(1) e high discrimination capability of the generated feature via BEMD and MCCA: the multiscale BIMFs extracted by BEMD can capture broader spectral information of SAR images. As displayed in Figure 1, more details of the target can be reflected in the BIMFs than the original image. erefore, by    As a result, it is a much harder classification task than SOC problems. ree typical EOCs (configuration variance, depression angle variance, and random noise corruption) are setup to comprehensively examine the robustness of the proposed. Compared with the baseline algorithms, the proposed one achieves better performance under different types of EOCs.

Conclusions
is paper proposes a feature generation method for SAR images for target recognition. e multiscale BIMFs are first obtained from the original image by BEMD, which provide more detail information of the target. To enhance the classification accuracy and efficiency in the following stage, MCCA is employed to combine the original image and decomposed BIMFs as a unified feature vector. Because MCCA constructs the projection matrices by considering the relationship between different components, the resulted feature vector actually reflects the inner correlations of the original image and decomposed BIMFs. SVM is adopted as the classifier to classify the generated feature vector. According to the experimental investigations on the MSTAR dataset under different kinds of operating conditions, several conclusions could be reached as follows: (1)   spectral information and reflect more details of the original image. So, they can complement the original image to provide more discrimination to improve the recognition performance. (2) MCCA is an effective method to combine the original SAR image and decomposed BIMFs. e generated low-dimensional feature vector contains the inner correlations between different components. (3) As an overall evaluation, the proposed achieves better performance than some baseline SAR ATR methods. Especially, the robustness of the proposed method under several typical EOCs including configuration variance, depression angle variance, and random noise corruption is much more superior. In the future, in order to better handle the uncertainties caused by EOCs, feature extraction, etc., the fuzzy theory [56,57] may be a potential way to further improve the recognition performance.

Data Availability
e MSTAR dataset is publicly available.

Conflicts of Interest
e author declares that there are no conflicts of interest regarding the publication of this paper.